# Multilingual Speech Translation
Ultravox V0 3
MIT
Ultravox is a multimodal speech large language model based on Llama3.1-8B-Instruct and Whisper-small, capable of processing both speech and text inputs.
Audio-to-Text
Transformers English

U
FriendliAI
20
1
Ultravox V0 5 Llama 3 3 70b
MIT
Ultravox is a multimodal voice large language model built upon Llama3.3-70B and Whisper, supporting both voice and text inputs, suitable for scenarios like voice agents and translation.
Audio-to-Text
Transformers Supports Multiple Languages

U
fixie-ai
3,817
26
Ultravox V0 4 1 Llama 3 3 70b
MIT
Ultravox is a multimodal speech large language model based on Llama3.3-70B-Instruct and whisper-large-v3-turbo, capable of processing both speech and text inputs.
Audio-to-Text
Transformers Supports Multiple Languages

U
fixie-ai
26
10
Ultravox V0 4 1 Mistral Nemo
MIT
Ultravox is a multimodal model based on Mistral-Nemo and Whisper, capable of processing both speech and text inputs, suitable for tasks like voice agents and speech translation.
Audio-to-Text
Transformers Supports Multiple Languages

U
fixie-ai
1,285
25
Ultravox V0 4 1 Llama 3 1 70b
MIT
Ultravox is a multimodal speech large language model, built upon the pre-trained Llama3.1-70B-Instruct and whisper-large-v3-turbo backbones, capable of receiving both speech and text as inputs.
Text-to-Audio
Transformers Supports Multiple Languages

U
fixie-ai
204
24
Ultravox V0 4 1 Llama 3 1 8b
MIT
Ultravox is a multimodal speech large language model built on Llama3.1-8B-Instruct and whisper-large-v3-turbo, capable of processing both speech and text inputs.
Audio-to-Text
Transformers Supports Multiple Languages

U
fixie-ai
747
97
Hf Seamless M4t Large
SeamlessM4T is a unified model supporting multilingual speech and text translation, capable of performing speech-to-speech, speech-to-text, text-to-speech, and text-to-text translation tasks.
Text-to-Audio
Transformers

H
facebook
4,648
57
Hf Seamless M4t Medium
SeamlessM4T is a multilingual translation model that supports both speech and text input/output, enabling cross-language communication.
Text-to-Audio
Transformers

H
facebook
14.74k
30
Wav2vec2 Xls R 2b 22 To 16
Apache-2.0
Facebook's Wav2Vec2 XLS-R model fine-tuned for multilingual speech translation tasks, supporting mutual translation between 22 input languages and 16 output languages.
Speech Recognition
Transformers Supports Multiple Languages

W
facebook
38
14
Featured Recommended AI Models